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Abstract 

A  Large  Deviations  Principle  (LDP),  demonstrated  for  occupancy 
problems  with  indistinguishable  balls,  is  generalized  to  the  case  in 
which  balls  may  be  distinguished  by  a  finite  number  of  colors.  The  col¬ 
ors  of  the  balls  are  chosen  independently  from  the  occupancy  process 
itself.  There  are  r  balls  thrown  into  n  urns  with  the  probability  of 
a  ball  entering  a  given  urn  being  1/n  (Maxwell-Boltzman  statistics). 
The  LDP  applies  with  the  scale  parameter  n  going  to  infinity  and  the 
number  of  balls  increasing  proportionally.  It  holds  under  mild  restric¬ 
tions,  the  key  one  being  that  the  coloring  process  by  itself  satisfies  a 
LDP.  Hence  the  results  include  the  important  special  cases  of  deter¬ 
ministic  coloring  patterns  and  of  colors  chosen  with  fixed  probabilities 
independently  for  each  ball. 


1  Introduction 

In  occupancy  models  which  follow  Maxwell-Boltzman  statistics,  balls  are 
thrown  in  to  n  urns  with  the  probability  of  a  ball  entering  a  given  urn  being 
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1/n,  independently  of  all  other  balls.  References  [1]  and  [4]  develop  sample 
path  large  deviations  principles  for  scaled  occupancy  processes  in  which  the 
time  variable  is  (approximately)  the  number  of  balls  thrown  per  urn,  the 
state  is  given  by  the  fraction  of  urns  which  contain  exactly  i  balls,  and 
the  number  of  balls  and  urns  are  scaled  up  in  fixed  proportion.  An  LDP 
is  obtained  for  infinite-dimensional  processes  in  [1],  whilst  [4]  focuses  on 
processes  with  a  finite  number  of  occupancy  levels,  i  =  0, . . . ,  I  with  i  =  /+ 
for  urns  with  more  than  I  balls.  Additionally  [4]  provides  explicit  solutions 
to  the  corresponding  calculus  of  variations  problem. 

In  this  paper  we  consider  a  generalization  of  such  occupancy  models  to 
allow  balls  with  more  than  one  color.  We  fix  on  the  case  of  two  colors,  as 
the  extension  to  any  finite  number  of  colors  is  straightforward.  The  over¬ 
all  process  can  be  regarded  as  the  conjunction  of  two  independent  random 
processes,  an  occupancy  process  which  determines  which  urn  each  ball  en¬ 
ters,  and  a  second  process  that  determines  color.  The  coloring  process  can 
be  quite  general  and  includes  the  important  special  case  where  each  color 
is  picked  independently  and  according  to  a  fixed  vector  of  probabilities  (iid 
coloring),  as  well  as  deterministic  coloring  patterns.  Again  time  is  scaled  by 
a  factor  of  n,  so  that  at  time  t  &  k/n  G  [0,/3],  k  balls  have  been  thrown. 
The  state  of  the  process  is  the  empirical  measure,  which  records  the  fraction 
of  urns  that  contain  i  balls  of  color  1  and  j  balls  of  color  2,  for  0  <  i  <  /+ 
and  0  <  j  <  J+,  where  1+  and  J+  correspond  to  more  than  I  balls  and 
more  than  J  balls,  respectively.  Thus  T”-(i)  is  the  fraction  of  urns  con¬ 
taining  i  color  1  balls  and  j  color  2  balls  after  approximately  nt  balls  have 
been  thrown.  In  general  the  process  {T ™  -(t)}  will  not  be  Markov  unless  the 
coloring  process  itself  is  Markov. 

There  is  a  wide  literature  on  occupancy  problems,  and  the  case  of  distin¬ 
guished  classes  of  balls  is  a  common  generalization  [7,  8] .  A  recent  motivat¬ 
ing  application  for  colored  occupancy  problems  is  the  analysis  of  wavelength 
conversion  in  the  optical  packet  switch  described  in  [6].  In  each  time  slot, 
a  random  collection  of  packets  (balls)  arrive  on  a  set  of  input  fibers  and 
must  be  routed  onto  a  set  of  output  fibers  (urns).  The  packets  on  each  fiber 
are  wavelength- multiplexed  on  a  finite  number  of  channels  (colors).  In  the 
absence  of  wavelength  conversion,  packets  must  use  the  same  channel  on  the 
input  and  output  fiber.  If  multiple  packets  of  the  same  channel  belong  to  the 
same  output  fiber,  the  excess  packets  must  be  converted  to  a  different  chan¬ 
nel  or  discarded.  Typical  quantities  to  be  computed  include  the  probability 
of  requiring  a  large  number  of  wavelength  converters  and  the  probability  of 
discarding  a  large  number  of  packets.  The  problem  was  approached  with 
single  color  large  deviation  analysis  in  [6];  the  results  there  give  only  an 
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upper  bound  on  the  true  number  of  converters  because  packets  discarded 
due  to  fiber  capacity  were  also  considered  to  require  conversion.  By  con¬ 
trast,  the  multi-colored  analysis  contains  the  information  needed  to  avoid 
this  overestimation  by  taking  fiber  capacity  into  account.  The  multi-colored 
approach  may  also  be  useful  in  studying  packet  switches  with  constrained 
wavelength  conversion  patterns. 

An  example  in  statistics  is  the  problem  of  coincidence ,  which  may  be 
illustrated  by  the  following  example,  see  also  [7].  Let  days  in  a  given  period 
be  urns,  and  let  there  be  three  colors  indicating  the  asthma  attacks  (balls) 
of  three  individuals.  A  ball  goes  into  an  urn  if  the  individual  has  an  asthma 
attack  on  that  day.  If  there  is  no  common  cause  (e.g.,  pollution  event) 
the  distribution  of  attacks  are  independent,  and  we  are  interested  in  the 
probability  that  the  actual  distribution  could  have  arisen. 

We  derive  the  LDP  for  the  colored  occupancy  processes  by  using  the 
representation  theorem  for  the  scaled  log  moment  generating  functions  for 
measurable  functions  of  sample  paths,  see  [3].  The  representation  is  as  an 
infimum  over  measures  of  the  sum  of  a  relative  entropy  cost  and  a  terminal 
cost.  As  discussed  in  Section  2,  there  is  a  natural  split  of  the  relative  entropy 
cost  between  a  cost  for  occupancy  and  one  for  the  coloring  process.  The  local 
rate  function  in  the  single-color  case  is  the  relative  entropy  R(Q(t)  ||  T(i)), 
where  @j  (t)  is  the  rate  at  which  balls  enter  ^-occupied  urns  when  the  time  is 
t  [4] .  The  corresponding  expression  in  the  colored  case  is  a  weighted  sum  of 
relative  entropy  terms  x\R  (01(t)  ||  T(t))  +X2R  (02(f)  ||  T(i)),  where  0(v(f) 
is  the  normalized  rate  at  which  balls  of  color  k  enter  urns  that  presently 
contain  i  balls  of  color  1  and  j  balls  of  color  2,  and  where  ay  (f)  is  the 
fraction  of  color  i  balls  per  urn  by  time  t.  The  overall  local  rate  function 
also  includes  an  additional  term  not  present  in  the  single  color  case,  namely 
the  local  rate  function  for  the  coloring  process  itself. 

In  [4] ,  the  large  deviations  upper  bound  followed  from  the  results  in  [2] , 
but  in  the  present  case  this  is  no  longer  true  since  the  occupancy  process  need 
not  be  Markov.  Instead,  we  present  a  direct  proof  based  on  weak  convergence 
which  only  assumes  a  sample  path  LDP  for  the  coloration  process. 

The  most  significant  obstacle  to  obtaining  an  LDP  occurs  in  the  proof  of 
the  large  deviations  lower  bound.  The  difficulty  here  is  the  singular  behavior 
of  the  relative  entropy  cost  when  any  element  of  T(t)  approaches  zero.  In 
[4],  this  difficulty  is  met  in  two  steps.  The  boundary  is  avoided  everywhere, 
except  at  the  initial  point,  using  a  perturbation  argument  which  relies  on 
the  joint  convexity  of  the  local  rate  function.  A  simple  “filling”  construction 
is  then  employed  in  the  vicinity  of  the  initial  point.  The  construction  is  es¬ 
sentially  equivalent  to  the  construction  of  a  change  of  measure  with  properly 
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bounded  Radon-Nikodym  derivative  that  would  be  needed  in  the  traditional 
approach  to  the  large  deviation  lower  bound.  In  the  present  setting,  a  more 
delicate  perturbation  argument  is  required  because  the  local  rate  function 
is  not  always  jointly  convex,  as  may  be  verified  with  simple  examples  (e.g. 
coloring  via  a  two-state  Markov  chain).  In  addition,  the  filling  construction 
is  replaced  by  one  based  on  time-reversal. 

A  striking  feature  of  the  single  color  occupancy  analysis  is  that  the  cal¬ 
culus  of  variations  problem  can  be  solved  to  obtain  explicit  extremal  trajec¬ 
tories,  and  the  rate  function  for  the  final  occupancy  state  can  be  computed 
directly  by  solving  a  fixed-point  equation  [4] .  The  same  is  true  in  the  present 
case,  and  a  companion  paper  [5]  is  in  preparation  that  generalizes  the  cal¬ 
culus  of  variations  analysis  to  colored  balls. 

An  outline  of  the  paper  is  as  follows.  In  Section  2  a  precise  formulation 
of  the  model  is  given  and  the  main  results  of  the  paper  -  the  upper  and 
lower  Laplace  principles  -  are  stated.  These  bounds  are  equivalent  to  the 
large  deviation  upper  and  lower  bounds.  The  section  also  presents  three 
important  special  cases  for  coloration  processes.  In  Section  3  the  upper 
bound  is  established  and  Section  4  establishes  some  properties  of  the  rate 
function  which  will  be  needed  in  the  proof  of  the  lower  bound  in  Section  5. 


2  Preliminaries  and  Main  Result 


We  construct  an  urn  model  with  colored  balls  as  follows.  Balls  are  thrown 
into  one  of  n  urns  sequentially.  The  throwing  process  is  modeled  by  a 
collection  of  independent  and  identically  distributed  (iid)  random  variables 
{Xp,  1  =  1,...,  \nr\  +  1},  where  |_aj  denotes  the  integer  part  of  the  scalar 
a.  Each  XJ1  is  uniformly  distributed  on  the  set  {1, . . .  ,n},  with  each  value 
of  the  set  corresponding  to  an  urn.  Thus  a  total  of  Nn  =  \nr\  +  1  balls 
are  thrown.  There  is  also  a  coloration  process  designated  by  £  {1,  2}. 
At  each  discrete  time  a  ball  is  assigned  color  Yjn,  and  then  placed  into  urn 
number  Xp. 

We  form  empirical  measures  T”-(i)  as  follows.  If  i  £  {0,...,/}  and 
j  £  {0, then 


■pn 

1  i,j 


(//n)  “  n  [J2  l{Y!r 


fEr=l  1{X™=m,Y™  =  1} 


\m=l 


-*}^{SLl  1{X™=rn,Y™  =  2}-j} 


In  other  words,  T”  -(Z/n)  is  the  fraction  of  cells  containing  exactly  i  color  1 
and  j  color  2  balls  when  l  balls  have  been  thrown.  Similarly, 
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r l+j(.l/n) 
r  lj+(l/n) 
r  ?+,j+(V«) 


n  1  ^  ^{Er=l  1{X?=m,yr"=l}>'f}^{E!.=l  1{X"=m,yr"=2}=i}  j 

\m=l  / 

n  I  ^  ^{Er=l  1{xp=m,yr"=l}=*}'*'{Et=l  1{X™=m,yrn=2}>-7}  / 

\m=l  / 

n  (  X{EU  1{X?=m,y"  =  l}>^}^{Et=l  1{X"  =  m,yrn=2}>^}  j 

\m=l  / 


By  definition  Tqi0(0)  =  1,  and  r^(0)  =  0  for  all  other  values  of  ( i,j ). 
(One  can  also  consider  other  initial  conditions,  with  only  simple  notational 
changes  in  the  results  to  be  stated  below.  When  extended  to  accommo¬ 
date  general  initial  conditions,  the  large  deviation  results  we  will  prove 
are  uniform  in  the  initial  condition,  in  the  sense  used  in  [4].)  The  defin¬ 
ition  of  Tn  is  extended  to  all  t  G  [0,  r]  not  of  the  form  l /n  by  piecewise 
linear  interpolation.  Let  U  denote  the  set  of  all  probability  measures  on 
{0, 1, . . . ,  I,  /+}  x  {0, 1, . . . ,  J,  J+}.  The  processes  Tn  are  considered  to  take 
values  in  the  space  of  continuous  functions  S  =  C  ([0,  r]  :  U),  equipped  with 
the  usual  supremum  norm. 

We  wish  to  analyze  the  large  deviation  asymptotics  of  these  processes, 
when  the  underlying  coloration  process  satisfies  a  large  deviation  principle 
and  is  independent  of  the  urn  selection.  To  this  end,  it  is  convenient  to  use 
the  Laplace  formulation.  Let  F  be  any  bounded  and  continuous  function 
on  S.  The  processes  Tn  are  said  to  satisfy  a  Laplace  principle  with  rate 
function  I  if  the  following  two  conditions  hold: 


•  For  each  M  <  oo,  the  set  {T  :  I (T)  <  M}  is  compact  in  S. 


lim  —  —  log  if  exp  [— nF(Tn)l  =  inf  [/(T)  +  F(T)]  . 

n—><x  n  res 

Since  the  processes  Tn  take  values  in  a  Polish  space,  the  notions  of  Laplace 
principle  and  large  deviation  principle  are  equivalent  [3,  Corollary  1.2.5]. 
Cumulative  coloration  processes  {xn,n  G  N}  are  defined  for  t  =  l/n  by 

1  . 1  1  . 1 

xi{l/n)  =  ~Y1 1W=i}>  x2(IM  1{X"=2}- 

n  it 

r=  1  r— 1 

These  definitions  are  also  extended  to  f  G  [0,  r]  not  of  the  form  l/n  by 
piecewise  linear  interpolation.  Define  the  set  of  functions  T  by  x  =  (aq,  X2)  € 
T  if 
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•  Xk(-)  is  increasing  and  continuous  with  Xk{0)  =  0  for  k  =  1,2, 

•  X\(t)  +  X2 (t)  =  t  for  all  t  G  [0,  r]. 

We  consider  this  set  of  functions  as  endowed  with  the  usual  sup  norm  topol¬ 
ogy,  and  make  the  following  assumption. 

Assumption  2.1  The  sequence  of  coloration  processes  {xn,n  G  N}  satisfy 
a  large  deviation  principle  on  T  with  the  rate  function  J . 

Since  T  is  also  a  Polish  space,  as  noted  previously  this  is  equivalent  to 
the  statement  that  J  satisfies  the  corresponding  Laplace  principle:  J  has 
compact  level  sets,  and  for  all  bounded  and  continuous  functions  G  :  T  —>  M, 

lim  • —  log  Li  exp  [— nG(xn)\  =  inf  \J(x)  +  G(x)l . 

n— too  n  X£T 


Additional  assumptions  on  the  coloration  processes  will  be  introduced  below. 
In  particular,  a  mild  structural  assumption  on  J  must  be  assumed  in  order 
to  prove  the  large  deviation  lower  bound. 

We  next  describe  a  few  typical  coloration  processes.  The  relative  entropy 
function  will  be  used  for  this  purpose,  and  indeed  throughout  the  paper.  For 
two  probability  measures  a  and  (3  on  a  Polish  space  A ,  the  relative  entropy 
of  a  with  respect  to  f3  is  defined  by 


R(a\\f3) 


da 


whenever  a  is  absolutely  continuous  with  respect  to  / 3  (and  with  the  con¬ 
vention  that  OlogO  =  0).  In  all  other  cases  we  set  R  ( a\\j3 )  =  oo. 


Example  2.1  Suppose  we  color  the  balls  to  achieve  a  deterministic  fraction 
Pk  of  color  k,  with  G  (0, 1).  More  precisely,  if  balls  of  color  k  have 
been  thrown  in  the  first  l  —  1  throws  ( with  N^_l  +  =  l  —  1),  and  if 

Ni-i/n  <  Pil/n,  then  we  color  the  Ith  ball  1,  and  otherwise  color  it  2.  The 
rate  function  for  the  corresponding  processes  {xn,n  G  N}  is  quite  simple: 


J(x) 


0  ifxk(t)=pkt 
oo  else. 


Example  2.2  An  alternative  coloring  scheme  is  to  select  the  color  in  an  iid 
fashion,  with  probability  pk  of  color  k,  where  pk  G  (0,1).  If  a  is  a  probability 
vector  define 

M(a)  =  R{a\\p), 
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and  in  all  other  cases  let  M(a )  =  oo.  Then  the  rate  function  is 


J(x)  = 


M{x\(t),X2{t))dt  if  x  is  absolutely  continuous 


Example  2.3  In  our  final  example  the  color  is  determined  by  a  two- state 
ergodic  Markov  process.  Let  the  underlying  transition  probabilities  be  de¬ 
noted  pkj.  k  =  1,2,1  =  1,2.  Let  b  denote  the  invariant  distribution,  with 
bk  £  (0,1).  Given  a  probability  vector  a,  let  qkj  be  any  ergodic  probability 
transition  matrix  with  invariant  distribution  a.  Define 

2 

M(a)  =  inf  ^  R  (, qk..  || pk,  )  ak, 

k= 1 

where  the  infimum  is  over  all  such  transition  matrices  q.  In  all  other  cases 
set  M(a )  =  oo.  Note  that  M(a)  =  0  if  and  only  if  a  =  b.  Here  again  the 
rate  function  is  written 


J(x)  = 


M(x\(t),X2{t))dt  if  x  is  absolutely  continuous 


Before  turning  to  the  proof  of  the  large  deviation  result,  we  introduce 
the  notation  needed  to  define  the  rate  function.  Define  V  be  the  set  of  real 
(I  +  2)  x  ( J  +  2)  matrices,  indexed  over  the  set  {0, . . . ,  /+}  x  {0, . . . ,  J+}, 
such  that  the  sum  of  all  elements  of  each  matrix  is  zero.  Let  Tk  :  U  — >  V  be 
defined  by  the  expressions 

[®]  =  ai—l,j  ~  =  ai,j— 1  —  ai,j^{j<J}  i 

where  for  convenience  we  define  a~ij  =  ct^-i  =  0. 

Next,  let  T  £  S  be  given  with  ro,o(0)  =  1.  Suppose  there  are  Borel 
measurable  functions  9k  :  [0,  r]  — >  li,  k  =  1,2,  and  x  £  T  such  that  for  all 
t  £  [0, t],  ^ 

r(t)  =  r(o)  +  [  (xlTl[el]+x2T2[e2])  ds.  (2.1) 

Jo 

Then  I(T)  is  defined  by 


/(T)  =  inf  /  [mii?  (6»1||T)  +x2R(92\\T)]  ds  +  J(x), 

x,0  In 
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where  the  infimum  is  over  all  such  9k  and  x  that  satisfy  (2.1).  If  rates 
satisfying  (2.1)  exist  with  1(F)  <  oo  then  we  say  that  T  is  a  valid  occupancy 
process.  If  such  rates  do  not  exist,  we  set  1(F)  =  oo.  In  Section  3  we  show 
that  for  every  valid  occupancy  path,  there  exist  rates  x  and  9k  which  achieve 
the  infimum. 

We  interpret  9kj(t)  as  the  rate  at  which  balls  of  color  k  are  thrown  into 
cells  that  at  time  t  contain  i  balls  of  color  1  and  j  balls  of  color  2,  where 
the  rates  are  normalized  to  give  a  probability  measure  for  each  k.  We  follow 
our  usual  convention  that  i  =  1+  refers  to  more  than  I  balls,  and  likewise 
for  j  =  J+.  These  normalized  rates  are  modulated  by  the  color  selection 
process  x  so  that  XkOk  represents  the  true  rate  at  which  balls  of  color  k 
enter  urns  of  various  occupancy  classes.  Finally,  the  transformations  Tk[a] 
represent  the  rate  of  change  in  T  induced  by  balls  of  color  k  entering  urns 
at  the  rates  given  by  a. 

In  the  next  three  sections,  under  different  assumptions  for  the  upper 
and  lower  bounds  we  will  prove  the  Laplace  principle  for  this  urn  model.  In 
particular,  in  Section  3  we  will  prove 

liminf - log  I?  exp  [— n-F(Tn)l  >  inf  \I(F)  +  F(r)l  , 

n=> oo  n  “  res 

and  in  Section  5  we  will  prove 

limsup  —  —  log-Fexp  [— nF(Fn)]  <  inf  [/(T)  +  F(T)]  . 
n — >oo  n  res 

These  bounds  are  equivalent  to  the  large  deviation  upper  and  lower 
bounds,  respectively  [3,  Corollary  1.2.5].  In  Section  3  we  prove  various 
properties  of  the  rate  function  I,  and  in  particular  show  that  I  has  compact 
level  sets.  Although  all  statements  and  proofs  are  for  the  case  of  2  colors, 
there  are  obvious  extensions  to  the  case  of  any  finite  number  of  colors. 

To  prove  these  bounds  it  will  be  convenient  to  use  a  representation 
for  exponential  integrals.  Let  Xn  and  yn  denote  the  product  space  of 
Nn  =  [nr\  +  1  copies  of  {l,...,n}  and  {1,2},  respectively.  Let  LP  de¬ 
note  product  measure  on  Xn,  where  each  marginal  of  LP  is  7rn,  the  uniform 
distribution  on  {l,...,n},  and  let  An  denote  the  distribution  that  is  in¬ 
duced  on  yn  by  {Yjn,  l  =  Let  /P  denote  any  probability  measure 

on  Xn  x  yn.  Suppose  that  {XJ1, 1  =  1 _ _  Nn}  and  { Y ]n,  1  =  1,...,  Nn } 

(on  the  canonical  probability  space  Xn  x  yn  and  with  expectation  op¬ 
erator  En)  have  the  joint  distribution  /P,  and  that  Tn  and  xn  are  con¬ 
structed  from  { X ",  1  =  1,...,  Nn)  and  { Yfn,  1  =  1,...,  Nn)  in  exactly  the 


same  way  that  Tn  and  xn  are  constructed  from  {X™,  l  =  1  ,...,Nn}  and 
{Y/1, 1  =  1,...,  Nn}.  Then  [3,  Proposition  1.4.2] 


log  Pi  exp  [— bTYT”)]  =  inf  Er 
n 


-R(pn\\Un 

n 


A")  +  P(Tn) 


(2.3) 


The  process  T™  is  an  urn  model  with  a  “biased”  or  “twisted”  distribution. 
The  representation  equates  the  normalized  log  of  the  exponential  integral 
with  a  variational  problem,  in  which  we  minimize  the  expected  value  of  the 
functional  F  under  the  twisted  distribution,  plus  a  relative  entropy  “cost” 
to  achieve  the  particular  twist. 

We  next  present  an  alternative  expression  for  the  relative  entropy  which 
reflects  the  natural  relations  between  the  underlying  measures.  Suppose  that 
pn  is  decomposed  into  the  following  product  of  conditional  distributions: 


Pn(dx i, . . . ,  dxNn ,  dyi , . . . ,  dyNJ 
=  An(dyi, . . .  ,dyNn) 

x^x,i(dxi\yi,  ■ . . ,yNn )  ■  •  •  (dxNn  |xi,. .  .,xNn-1}yi,  ...,yNn). 

Define  the  random  measures 


ltf(dxi)  =  n™j{dxi\Xr,r  =  1, . . . ,  l  -  1,  Yr,  r  =  1, . . .  ,Nn). 

Thus  p,™  is  the  distribution  of  the  cell  into  which  the  Ith  ball  is  thrown,  given 
the  outcome  of  all  previous  throws  and  the  colors  of  all  the  balls.  Using  the 
fact  that  IP  is  product  measure  and  the  chain  rule  for  relative  entropy  [3, 
Theorem  C.3.1],  we  have 


R(nn\\Yin  <g)  A")  =  En 


"  JV„ 

5>WII*”)  +  fl(V||A”) 

.1=1 


(2.4) 


This  representation  separates  the  total  relative  entropy  into  a  contribution 
due  to  the  twisting  of  the  coloration  distribution,  and  a  sum  of  contributions 
due  to  twisting  of  the  distribution  of  the  individual  throws,  conditioned  on 
the  coloration  process  and  all  previous  throws. 


3  The  Large  Deviation  Upper  Bound 

In  this  section  we  prove 

lirninf  --  log E exp  [-nP(Tn)]  >  inf  [I(T)  +  F(T)] ,  (3.1) 

n—>oo  n  res 
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which  corresponds  to  the  large  deviation  upper  bound.  Since  in  the  oc¬ 
cupation  measure  problem  we  do  not  distinguish  between  cells  that  con¬ 
tain  the  same  number  of  balls  of  the  various  colors,  it  makes  sense  to 
rewrite  the  relative  entropy  one  last  time.  Given  {X™,  l  =  1, . . . ,  l  —  l}  and 
{Y?,l  =  l,---,Nn},  we  know  that  nT^-(l/n)  is  the  number  of  cells  that 
contain  i  balls  of  color  1  and  j  balls  of  color  2.  For  (i,j),  let  Kl3  denote 
the  set  of  cells  of  the  corresponding  type,  and  let  \K^j\  =  nTf^l/n)  denote 
the  number  of  elements  of  K^j.  Let  i/”-( l/n )  denote  the  total  probability 
assigned  to  cells  of  this  type  by  p?  (the  definition  being  irrelevant  when 


=  0)  : 


vZj(l/n )  =  vi&ij)  =  ^r(W)- 


The  convexity  of  xlogx  then  implies  the  following  bound: 


En  [R 

"  i+,J+ 

=  Bn  E  E 

i=0,j=0  m£Kij 
+ 

>E"  £  lA'«l 

i=0,j=0 


=  B"[B(^((/n)||f"((/n))]. 


«{>»)) 

7 rn  ({m}) 


E  meKjjtftt™}) 


7 rn  ({m}) 


vV-Al/n)  \ 

tm)  *Wn: 


ZmeKi , 


The  inequality  above  becomes  an  equality  when  the  measure  fif  puts  the 
same  weight  on  urns  of  the  same  type,  and  thus  one  would  expect  this 
property  to  hold  for  the  measure  that  achieves  the  minimum  in  the  vari¬ 
ational  representation.  For  each  t  G  [0,  Nn/n]  define  vn(t)  =  vn(l/n)  if 
t  G  [l/n,  l/n  +  l/n).  Let  rn  denote  the  piecewise  constant  (rather  than 
piecewise  linear)  interpolant: 

f n(t)  =  f n{l/n)  for  t  G  [l/n,  l/n  +  l/n). 

Note  that  if  Fn  converges  uniformly  to  f ,  then  so  does  PL 

For  a  ^-valued  process  rj  and  x  G  T,  we  define  increasing  processes 
0?  ®xk)i:j,i  G  {0, j  G  {0, ...,  J,  J+}  by 


(V®  xk)i,j(t)=  /  7jij(s)xk(s)ds. 

J  o 
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When  Xk(r)  >  0  and  {r]®xk)  /  xk{r)  appears  in  the  relative  entropy  function, 
it  is  interpreted  as  the  probability  measure  on  {0, ,  /+}  x  {0, . . . ,  J+}  x 
[0,  r]  that  assigns  to  the  set  A  x  B  mass 


Theorem  3.1  Define  the  processes  Tn,Tn,x^,k  =  1,2  and  un  as  above  for 
the  given  measure  pLn.  Then  the  collection 

{ (r\  xl,  fn  <g>  x%,  vn  ®  xl),k  =  1,  2,  n  €  n} 

is  tight.  Thus  given  any  subsequence  there  exists  a  further  subsequence  which 
converges  in  distribution  to  processes  F,  xk,  Ak,  £ k ,  fc  =  1,  2  defined  on  a  prob¬ 
ability  space  with  expectation  operator  E.  These  limit  processes  have  the 
following  properties. 

1.  Each  process  xk  is  absolutely  continuous  (w.p.l),  with  derivative  in  t 
denoted  by  xk  ■ 

2.  Each  process  £fc  can  be  decomposed  in  the  form 

(k  =  9k  ®xk, 

where  the  measurable  process  6k  takes  values  in  U. 

3.  Each  process  Ak  can  be  decomposed  in  the  form 

Ak  =  F  £g>  Xk- 

4-  The  relation  (2.1)  holds,  with  T,  xi,  X2,  01, 62  replaced  byT,  x\,  X2, 6l  ,02 ■ 

Proof:  It  is  easy  to  see  that  the  processes  Fn,x£,Fn  ®  xf.,vn  ®  x](,k  = 
1,2  are  all  uniformly  (in  n  and  uf)  Lipschitz  continuous.  Therefore  the 
ensemble  takes  values  in  a  compact  set,  which  automatically  gives  tightness, 
and  hence  convergence  along  subsequences.  If  a  convergent  subsequence  is 
fixed  (with  limit  F,  xk,  Ak,  (k,  k  =  1,2),  the  limit  processes  are  also  Lipschitz 
continuous,  and  hence  a.e.  (in  t )  differentiable,  w.p.l.  It  follows  directly 
from  the  definitions  that  X^=o"j=o  ^ij  (*)  =  f°r  t  6  [0,  r].  Since  each 

component  of  £ ^  -(f)  is  nondecreasing,  there  is  a  measurable  ^/-valued  process 
6^j  such  that  £?L(t)  =  J0*  6^ j(s)xk(s)ds.  The  convergence  of  nondecreasing 
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processes  x £  — >  xj~  and  continuity  of  T  imply  T  <8  >  T  <8  Xk-  Since  Tn 

and  hence  also  Tn  converge  uniformly  to  T,  Tn  <8  x^  — >  T  <8  x^.  Thus  A*,  has 
the  indicated  decomposition. 

Finally  we  consider  the  last  item  in  the  theorem.  Consider  a  component 
fjj.  We  assume  that  i  €  {1 j  €  {1,...,J},  and  observe  that  a 
similar  argument  to  the  one  used  below  will  give  the  analogous  conclusion 
for  all  other  cases.  Let  d =  u  ( Y™ ,  1  <  r  <  Nn,X™,  1  <  r  <  l).  We  can 
write 


fJL(Z/n  +  l/n)«fJL(Z/n) 


—  1  r  Ay? 


n 


{y~=i}  ^l{x£  is  an  urn  of  type  at  time  ;} 


1  r  v 


{x"  is  an  urn  of  type  (i,j)  at  time  i 


1 

n 


+  l{yfen= 2}  (^{x"  is  an  urn  of  type  (jj-i)  at  time  /} 


1{X"  is  an  urn  of  type  (i,j)  at  time  i 


=  -1{y-™= i} 

+  ^1{y-=2}  Kj-i(V«)  -  <j(Vra)]  +eJL(Z/n), 
where  j  efj(l/n),  1  =  0,...,  Nn  —  l|  is  a  martingale  difference  with  respect 


to  Tj  '  with  E 


e™j(l/n)  =  0(l/n2).  Thus 


T &(t)  -  r^-(°)  =  iy n  ®  (0  -  K  ®  !0 


-n\l,n/ 


+  k  ®  ,!/)  -  k  ®  ®5)js*w + 9?m 

where  the  process  gfj  tends  uniformly  to  zero  on  [0,r].  Therefore 

Tijit) -Tij(0)  =  (91  ®  xifcijtt)  -  (Q1  ®  X!)lj(t) 

+  (02  <8  x2 -  (02  (8  x2)lj{t). 

The  last  display  is  equivalent  to  the  i,j- th  element  of  (2.1). 

Theorem  3.2  Under  Assumption  2.1, 

lirninf  —  —  log  E  exp  [— nF(Tn)]  >  inf  [/(T)  +  _F(T)1  . 
n—Hx  n  res 


2  ,n  / 


□ 
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Proof:  Owing  to  the  representation,  it  suffices  to  show  that 


lim  inf  inf  En 

n—KX)  fin 


-R{^n\\Un  0  An)  +  F{Tn) 


>  inf  im  +  F(F)}. 


According  to  equations  (2.4)  and  (3.2),  we  have  the  bound 


-R(lin\\Un0  An)  >  En 
n 


-  Y'  R(vn(l/n)\\Tn(l/n))  +  -i2(An||An 
n  '  n 

l=i 


Using  the  chain  rule  ([3,  Theorem  C.3.1])  again,  the  non-negativity  of  rela¬ 
tive  entropy,  and  r  <  Nn/n,  we  can  write 


E* 


Nn 


1  =  1 


>  Er 


=  Er 


Jo 


R[un\\Tn)dx1l+  I  R(vn\\rn)dx2 
x±(t)R 


/  z/1  <g)  xy 

fn  ®  x'7  ^ 

+  a?2(r).R 

^  un  0  X  2 

f n  0  x?2 

V  ®i(t) 

^1 (r)  J 

\  ®2<T) 

%2  (t) 

According  to  Theorem  3.1,  given  any  subsequence  of  N  we  can  find  a  further 
subsequence  (again  denoted  by  n)  along  which  we  have  the  convergence  in 
distribution  of  (fn,x^,  fn(g )x%,  un0x^,  k  =  1,  2).  Using  Fatou’s  Lemma  (for 
convergence  in  distribution)  and  the  lower  semicontinuity  of  relative  entropy 
[3,  Lemma  1.4.3], 


lim  inf  En 

n— >oo 


-|  Nn 

-J>P”((/n)||n>(I/n)) 
.  1=1 


>  lim  inf  Er 

n— >oo 


>  E 

=  E 


x\  {t)R 


xUr)R  ( 

61  0  x\ 
xi{t) 


un  0  x i 

*1  (t) 

f  <g>  Xi 


f n  0  X 1 


Xi  (r) 


®i  (T) 

+  X2  (r)R 


+  ^2  (r  )-R 

(vn0x% 

V  ®2(T) 

f  <g>  x2  y 

V  ®2(r) 

x2(r)  ) 

fn  0  X2 


X2  (t) 


Jo 


R(91{t)\\T(t))dx1(t)+  /  R(e2{t)\\T{t))dx2(t) 


We  claim  that  since  {xn,n  €  N}  satishes  a  large  deviation  principle  on 
T  with  rate  function  J, 


liminf  —  R  (An||An)  >  EJ(x ). 

n— xx)  71 


(3.3) 
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Indeed,  it  follows  from  the  variational  representation  that  for  all  bounded 
and  continuous  functions  G  :  T  — ►  R, 


liminf  En 

n— >oo 


—R  (An||An)  +  G{xn) 
n 


>  inf  [J{x)  +  G(x)]  . 

xGT 


Thus 


liminf  En 

n— >oo 


—R  (An||An) 
n 


>  inf  [J(x)  +  G(x)]  —  E  [G(x)] , 

x£T 


and  since  G  is  arbitrary, 


liminf  —R(Xr 

n— >oo  77, 


||An)  >  sup 
GeC6(T) 


inf  \J(x) 

xeT 


+  G{x)}-E[G{x)} 


We  claim  that  the  right  hand  side  of  this  display  is  bounded  below  by  EJ(x). 
Let  —  Gr  be  a  sequence  of  bounded,  non-negative  continuous  functions  that 
converge  up  to  J  as  r  — >  oo.  It  follows  that  J(x)  +  Gr(x)  >  0  for  all  r  and 
ieT,  and  so  inf xgt[J(x)  +  Gr(x)]  >  0.  Since  the  monotone  convergence 
theorem  implies  E  [— Gr(x)\  |  E  [J(^)],  the  result  now  follows. 

We  have  the  following  inequalities,  each  of  which  is  explained  after  the 
display. 


lim  inf - log  E  exp  [— nF(Tn)l 

n—>oo  Ti 


lim  inf  En 

n— >oo 


>  lim  inf  Er‘ 

n—>oc 


n 


-R(iin\\nn®  An)  +  F(rr 


i  Nn  _  i 

-YR  ( vn(l/n ) \\Tn(l/n))  +  —R  (An||An)  +  F(fn) 
n  n 


l=i 


>  E 


R(e1{t)  ||r(t))dx-i(t) 


> 


+  fT  R  {( 92{t )  Ilf  (t))  dx2(t)  +  J(x)  +  F(T) 
Jo 

E[I(V)  +  F(T)] 


>  inf  [I(V)  +  F(T)]. 


The  first  equality  is  due  to  the  relative  entropy  representation  and  the  fact 
that  /d1  is  a  minimizer;  the  following  inequality  uses  the  decomposition  (2.4) 
and  the  bound  (3.2);  the  second  inequality  uses  the  bound  (3.3),  the  bound 
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immediately  above  (3.3),  the  convergence  in  distribution  of  Tn  to  f,  and 
the  continuity  of  F\  the  third  inequality  uses  the  properties  of  the  limit 
processes  stated  in  Theorem  3.1  and  the  definition  of  the  rate  function;  the 
final  inequality  is  obvious.  We  have  proved  that  given  any  subsequence  of 
N  there  is  a  further  subsequence  along  which  (3.1)  holds.  By  the  usual 
argument  by  contradiction,  (3.1)  holds  as  stated.  □ 

4  Properties  of  the  Rate  Function 

In  this  section  we  prove  some  important  properties  of  the  rate  function. 

Theorem  4.1  Under  Assumption  2.1  the  set  {T  :  J(T)  <  M}  is  compact 
for  each  M  €  [0,  oo). 

Proof:  Since  all  paths  T  with  J(T)  <  oo  are  Lipschitz  continuous  with  a 
common  constant,  we  need  only  show  that  T  — ►  J(T)  is  lower  semicontin- 
uous.  Let  Tn  — >  T  as  n  — >  oo.  If  liminfn^oo  I  (Tn)  <  /(T),  then  we  can 
extract  a  subsequence  (again  denoted  by  n)  such  that  I  (Tn)  converges  and 
lirrin^oo  I  (Tn)  =  I(T)  —  e  for  some  £  >  0.  Let  6k’n,k  =  1,2  and  xn  be 
associated  rates  and  cumulative  coloration  processes  that  satisfy 

r„  =  x^t1  [el'n]  +  x£T2  [e2’n] 


and 


/  (rn)  =  [T  [xfR{6l'n  ||L)  +  x%R(02’n  ||r)]  dt  +  J{xn)  +  1  In. 

Jo 

Exactly  as  in  the  proof  of  the  convergence  theorem  (Theorem  3.1),  the 
uniformly  Lipschitz  continuous  processes  (T„,  x^,  0k,n  <g>  x^ ,  Ln  <g>  xf,  k  = 
1,2)  converge,  at  least  along  a  subsequence,  to  a  collection  of  processes 
(T,Xk,0k  (g>  x’fc,T  <g>  Xk,k  =  1,2).  Using  the  lower  semicontinuity  of  J  and 
the  relative  entropy, 


lim  inf 

n— >oo 


'xfRie1^  ||rn)  +x%R{e2'n  ||rn)]  dt  +  j(xn ) 


=  lim  inf 

n—>oc 


xf{r)R 


0hn  ®  xf 

x™(t) 


+  x%(t)R 


f02'n®x% 

V  x2(r) 

x2 (t)  J  _ 

rn  ®  Xi 

xf{r) 

dt  +  J  ( xr 
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> 


x\(t)R 
+  X2{t)R 


'01  <g>  Xl 

r  <g>  xi 

,  Xi (r) 

xi(r) 

92  ®  x2 

r  <g>  x2 

x2  (t) 

x2(t) 

dt  +  J(x] 


xiRie1  ||r)  +  x2r(6 2  ||r)]  dt  +  j(x) 


Since  we  also  have 

r(*)-r(0)  =  iim  (Fn(t)  —  rn(o)) 


=  Jkn  [xfT1  [d1’™]  +  xtfT2  [02’n]}  ds 


lim  [  T1 

n— >oo 


r  ft 


xf9^nds  +  T2  /  Xp2’nds 


=  T 1 


xi^ds  +T2 


L.7o 


x292ds 


IJo 


=  /  [xiT1  [01]  +x2T2  [02]]  ds, 

./o 

we  conclude  that 

I(r)  <  [xi^d1  ||r)  +x2i?(d2  ||r)]  dt  +  J(x) )  <  Jim  i(rn), 

a  contradiction.  Therefore,  liminfn—xx,  I  (Tn)  >  /(T). 


□ 


It  is  also  true  that  given  any  T,  infimizing  0’s  and  x’s  exist. 

Lemma  4.2  Let  F  e  S  be  given.  There  exist  measurable  functions  9k,k  = 
1,2  and  x  €  T  which  achieve  the  infimum  in  the  definition  of  1(F). 


Proof:  Since  the  proof  uses  the  same  ideas  as  that  of  the  previous  theorem, 
the  argument  is  only  sketched.  If  1(F)  =  oo  there  is  nothing  to  prove.  If 
1(F)  <  oo  then  there  exist  9k,n ,  k  =  1,  2  and  xn  G  T  such  that 

f  =  xfT1  [61’71]  +  x£T2  [02’n] 

and 

[xfR  (91,n  ||r)  +  x%R  (92’n  ||r)]  dt  <  1(F)  +  1  /to. 
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Arguing  exactly  as  in  the  previous  theorem,  we  can  consider  the  limit  n  — +  oo 
(along  a  subsequence)  and  construct  the  minimizing  9k,  k  =  1,2  and  igT, 

□ 


A  special  but  common  case  is  when  the  coloration  rate  function  takes 
the  form 

J(x)  =*  /  M(x\,x2)dt 

Jo 

for  some  convex  proper  function  M  :  M2  — [0,  oo]  (see  the  examples  of 
Section  1).  In  this  case  we  can  write  I  as 

I(  r)=  [Tmt)dt. 

Jo 

in  terms  of  a  local  rate  function 

L(r, rj)  =  mi{a1R{91\\T)  +  a2R{92\\T)  +  M{a1,a2)  : 

a  G  C,  9k  G  U,  q  =  ai T1  [01]  +  a2T 2  [92]  }  ,  (4.1) 

where  C  is  the  set  of  probability  distributions  on  {1,  2}.  We  will  rely  on  the 
following  assumption  in  the  proof  of  the  lower  bound. 


Assumption  4.1  Assumption  2.1  holds,  and  in  addition: 

(A)  The  rate  function  J(x )  takes  the  form  fQ  M(x)dt,  where  M  is  a  proper 
convex  function. 

(B)  There  is  a  point  a  G  C  such  that  M(a)  =  0  and  a*  >  0,  *  =  1,  2. 


As  noted  previously,  the  assumed  form  for  J  is  typical.  Since  M  is  a  rate 
function,  there  is  at  least  one  probability  vector  a  at  which  M(a )  =  0.  The 
assumption  that  this  occurs  at  a  point  where  both  components  are  positive 
is  very  mild. 

In  the  remainder  of  this  section  we  will  construct  processes  and  controls 
that  will  be  used  in  the  proof  of  the  large  deviation  lower  bound.  We  first 
define  the  natural  occupancy  path  corresponding  to  a  given  colorization 
process,  and  the  zero-cost  path.  For  y  G  [0,  oo)  and  i  6  Z  let 


Vi(y) 


£ e~y  i  >  0 
0  %  <  0 
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denote  the  i-th  component  of  a  Poisson  distribution  with  mean  y,  and  let 
Qi(y)  =  'Ej>iPj(y)  be  the  Poisson  tail  probability  function.  The  product 
of  two  independent  Poisson  distributions  with  means  ij\  and  y2  dehnes  a 
mapping  <h  :  1R+  x  1R+  — *  U  given  by 


$i,j(yi,y2)  =  Vi(yi)Vj{y2) 

$i,j+(yi,y2 )  =  Vi(yi)Qj(y2) 

$i+,j(yi,y2)  =  Qi{yi)Vj{y2) 

®i+,j+  {yi  ,2/2)  =  Qi  (2/1 )  Q  j  (2/2 )  • 

The  mapping  <1>  is  the  limiting  (as  n  — >  00)  mean  urn  occupancy  distribution 
for  an  experiment  in  which  nyk  balls  of  color  k  are  thrown  into  n  urns. 

Lemma  4.3  (Natural  occupancy  path)  Suppose  that  condition  (A)  of 
Assumption  f.l  holds.  Let  x  6  T  be  a  colonization  process  with  finite  cost 
J(x).  The  natural  occupancy  path  corresponding  to  x  defined  by  T*(t)  = 
$(xi(t),X2(t))  is  an  occupancy  path  in  S  which  satisfies  the  initial  condition 
r^o(0)  =  1  and  the  bound  I(T*)  <  J(x). 

Proof:  The  initial  condition  is  immediate  from  the  fact  that  x(0)  =  0.  The 
continuity  of  x  and  of  the  Poisson  distribution  with  respect  to  its  mean 
ensure  that  T*  G  S.  To  establish  the  bound  on  7(r*),  we  will  show  that  the 
derivative  of  the  path  satisfies  the  differential  equation 

T*  =xlTl[T*}+x2T2[T*). 


Then  according  to  the  definition  of  the  rate  function, 

7(r*)  <  (y^XkRiT*  ||  r*)  +  M(x)  \  dt  =  M(x)  dt  =  J(x). 

Note  that  fjffPiix)  =  Ti-ifx)  —  Vi(x).  Then,  for  i  <  I  and  j  <  J,  we 
have 


r lj  =  xi  [Pi-i(x  1)  -  Vi(xi)\  Vj(x 2)  +  x2Vi{x  1)  [Pj- i(x2)  -  Vj(x 2)] 

=  ii(r;_u-ry +i2(r-j_1-ry 

as  desired.  Using  the  fact  that  ^ Qi(x )  =  Vi(x),  the  cases  involving  i  =  1+ 
or  j  =  J+  follow  similarly.  □ 


The  following  lemma  is  immediate,  using  the  linear  colorization  process 
x(t)  =  at. 
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Lemma  4.4  (Zero  cost  path)  Suppose  that  Assumption  Jhl  holds,  with 
a  €  C  such  that  M{a )  =  0.  Then  the  function  Z{t)  =  &(a±t,a2t)  is  an 
occupancy  path  in  S  which  satisfies  I(Z )  =  0  and  the  initial  condition 
£o,o(0)  =  1- 

In  the  single  color  case  analyzed  in  [4],  the  local  rate  function  expressed 
by  (4.1)  takes  the  simpler  form 

Ls(T,r1)  =  R(@\\T) 

where  T  and  0  are  probability  distributions  on  {0,...,/+}  and  where  0 
has  the  explicit  form  0j  =  —  )T)*=0  rh  ■  -*-n  that  case,  the  convexity  of  the 
relative  entropy  implies  that  the  local  rate  function  is  a  convex  function  of 
its  arguments. 

In  the  present  case,  the  local  rate  function  is  a  convex  function  of  r/  but 
is  not  necessarily  jointly  convex  in  (r,  rj).  It  can  be  shown  in  fact  that,  under 
Assumption  4.1,  L  is  convex  if  and  only  if  the  function  M(a)  +  h(a )  is  convex 
for  a  G  C,  where  h(a)  =  — aq  logai  —  02  log  02  is  the  entropy  function.  It  is 
easy  to  see  that  Examples  2.1  and  2.2  always  satisfy  this  condition.  However, 
it  can  also  be  demonstrated  that  Example  2.3  satisfies  this  condition  if  and 
only  if  the  Markov  transition  probabilities  satisfy  pu  +  P22  <  1- 

In  our  proof  of  the  lower  bound,  we  require  a  technical  result  showing 
that  every  valid  occupancy  path  T  is  close,  both  in  sup  norm  and  in  cost, 
to  an  occupancy  path  for  which  each  element  is  bounded  away  from  zero  by 
a  power  of  t.  This  fact  allows  us  to  avoid  explictly  considering  occupancy 
paths  which  are  close  to  the  boundary  after  time  t  =  0.  When  the  local  rate 
function  is  convex,  this  technical  result  is  easily  demonstrated  by  slightly 
perturbing  the  given  occupancy  path  in  the  direction  of  the  zero-cost  path 
(which  is  itself  avoids  the  boundary  after  t  =  0). 

We  will  use  a  modified  form  of  this  argument  to  establish  this  result 
without  requiring  convexity  of  the  local  rate  function.  We  first  show  that 
T  is  close  to  an  occupancy  path  I  whose  optimal  colorization  process  x  has 
each  component  Xk(t)  bounded  below  by  a  function  of  the  form  8t  for  some 
8  >  0.  Secondly,  we  show  that  perturbing  T  in  the  direction  of  the  natural 
occupancy  process  T*  corresponding  to  x  yields  an  occupancy  process  with 
the  desired  properties.  These  steps  are  undertaken  in  the  next  two  lemmas. 

Lemma  4.5  Suppose  that  Assumption  f.l  holds.  Let  F  G  S  be  given  mid 
let  e  >  0.  There  exists  T  G  S,  with  1(F)  <  00  and  associated  optimal  rates 
x  and  6,  which  further  satisfies  the  following  properties: 

•  /(f)  <  J(r)  +  s, 
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•  d(r,f)<£, 

•  there  exists  6  >  0  such  that  Xk{t)  >  St  for  all  t  G  [0,  r]. 


Proof:  We  will  construct  T  by  following  the  zero  cost  path  Z  of  Lemma  4.4 
for  a  short  time  A  and  then  closely  tracking  the  original  process  T  for  t  >  A. 

The  key  step  is  to  construct  an  occupancy  process  F  which  has  initial 
condition  f  (0)  =  Z( A),  which  for  sufficiently  small  A  satisfies  d(T,  f )  <  e/2 
and  1(f)  <  I(T)  +  e. 

Once  r  has  been  constructed,  we  then  may  define 


m 


Z(t)  0  <  t  <  A 
f  (t  —  A)  A  <  t  <  t. 


Since  the  absolute  value  of  the  derivative  of  each  element  of  an  occupancy 
function  with  finite  cost  is  bounded  by  1,  we  immediately  have  d(F,  F)  <  A, 
and  hence  d(T,  T)  <  e  for  sufficiently  small  A.  Moreover 

/(f)  =  0  +  ^  L  (f ,  f)  dt  <  /(f)  <  /(r)  +  s. 

Finally,  the  optimal  colorization  rate  for  F  on  the  interval  [0,  A]  is  given  by 
Xk{t)  =  akt.  Because  the  colorization  processes  are  monotonically  increas¬ 
ing,  we  have  the  bound  Xk(t )  >  (o^( A  A  r)/r)  t  for  all  t  G  [0,  r] 

It  remains  to  construct  an  occupancy  function  I  with  the  required  prop¬ 
erties.  We  define 


f  (t)  =  e~A  (T(t)  -  r(0))  +  Z{ A)  =  e~Ar(t)  +  (1  -  e“A)Zc( A) 


where  Z{ A)  is  the  zero-cost  distribution  of  Lemma  4.4  at  time  A,  and  where 


Zc(  A) 


Z{ A)  -  e^Ar(0) 
1  —  e~A 


is  the  conditional  distribution  obtained  from  Z{ A)  by  removing  the  prob¬ 
ability  mass  from  its  (0, 0)  element.  As  T  is  a  convex  combination  of  two 
elements  of  S,  we  immediately  have  T  G  S,  and  it  is  also  clear  that  d(r,  T) 
can  be  made  arbitrarily  small  by  decreasing  A. 

It  remains  to  establish  the  desired  bound  on  /(f).  If  a  G  U  is  the 
distribution  that  puts  all  of  its  mass  on  the  (/+,  J+)  element,  note  that 
Tk[a]  =  0  for  k  =  1,2,  reflecting  that  balls  thrown  into  /+,«/+  urns  have 
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no  effect  on  the  occupancy  state.  Let  x  and  6  be  the  optimal  rate  processes 
for  T.  Then  the  rates  x  =  x  and  0k  =  e~A0k  +  (1  —  e~A)a  satisfy 

xi T1^1}  +  L2T2[92}  =  e~A  (x'i T1^1}  +  x2T2[62})  =  e~At  =  f 

and  are  therefore  feasible  rates  for  constructing  T.  Using  x  =  x  and  the 
joint  convexity  of  the  relative  entropy,  we  have 

PT  2 

1(f)  <  /  J2^R(0k  ||  f)  dt  +  J(x) 

k= l 
,T  2 

<  e~A  /  ^xkR(ek  ||  r)  dt 

■7o  k= i 


2 

J  Y  xkR  ( a  H  Zc( A))  dt  +  J (x) 


+  (l-e"A) 

Jo 

<  L(T)  +  r(l  —  e_A)i?  (a  ||  ZC(A)) . 


Using  the  bound  Zj+:j+( A)  >  Vi+i(aiA)Vj+i(a2A)  the  difference  1(f)  — 
/( T)  is  bounded  by 

r(l  -  e”A)  log  J-1A-(7+J+2)(I  +  1)!(J  +  l)!eA(l  -  e“A))  , 

which  approaches  zero  as  A  — ■*  0.  □ 


Lemma  4.6  Suppose  that  Assumption  f.l  holds.  Let  T  G  S  be  given  such 
that  J(r)  <  oo,  and  let  e  >  0.  Then  there  exists  6  >  0,  K  G  N  and  P  £  5 
with  the  following  properties. 

1.  I(T£)  <  I(T)  +  £, 

2.  rf(re,r)  <  e, 

3.  T ij(t)  >  StK  for  allt  €  (0,  r]  and  ( i,j )  G  {0, 1, . . . ,  I,  /+}x{0, 1, . . . ,  J, «/+}  . 

Proof:  Using  Lemma  4.5,  there  exists  f  and  6  >  0  with  d(r,f)  <  e/2, 

/(f)  <  I(T)  +  £,  and  xk  >  6t,  where  x  and  9  are  the  optimal  rates  used  in 
the  definition  of  /(f).  Let  T*  =  ^{x\,x2)  be  the  natural  occupancy  path 
corresponding  to  x,  and  define 

T£  =  (1  —  A)f  +  AT*. 
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By  the  triangle  inequality,  Te  will  be  sufficiently  close  to  T  if  A  <  e/4.  Note 
that  the  rates  x£  =  x  and  9k,£  =  (1  —  X)9k  +  AT*  are  feasible  rates  for 
generating  Te,  so  that 

fT  2 

/(r£)  <  /  ^zkR(ek'£  ||  r£)  dt  +  J(x) 

k= i 

<  (1  -  A )/(f )  +  XI (T*) 

<  m 

<  /(r)  +  e 

where  we  have  used  convexity  of  the  relative  entropy  and  the  fact  that 

=  j(x)  <  /(f). 

Finally,  for  i  <  I,j  <  J,  the  i,  j-th  element  of  satisfies  the  lower 
bound 


Ffj  (t)  >  XT*j(t) 


Xx\(t)xj(t)  c-t 


>  A 


i\j\ 

ie~ 


§i+j„-T 


i\j\ 


ti+j 


for  all  t  G  [0,  t\.  Similar  bounds  hold  for  cases  involving  i  =  1+  and  j  =  ./+, 
since  Qi  >  Vl+\.  Hence  T£  has  all  of  the  desired  properties.  □ 


The  final  lemma  of  this  section  is  the  principle  result  that  will  be  used 
to  support  the  proof  of  the  lower  bound. 

Lemma  4.7  Suppose  that  Assumption  /.l  holds.  Let  T  6  S  be  given  such 
that  /( r)  <  oo  and  such  that  for  some  6  >  0  and  K  €  N  the  lower  bound 
r ij(t)  >  6th  holds  for  all  ( i,j )  €  {0, 1 ,...,/,  /+}  x  {0, 1, . . . ,  J,  J+}.  Let 
x ,  91  and  92  satisfy 

f  =  xi T1  [01]  +  x2T2  [92] 

and  t 

/(r)  =  (  [xii?^1  ||r)  +x2R(92\\r)]dt  +  j(x). 

Jo 

Given  e  >  0  there  exist  T*,  91’* ,  92'*  and  a  >  0  with  the  following  properties. 
1. 

f*  =  xiT1  [01’*]  +x2T2  [92’*]  ,r^0(0)  =  1, 
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2. 

i(r*)<  f  ||r*)  +x2R{e2’*\\r*j)]dt  +  j(x)<i(r)+£, 

Jo 

3.  d(r*,r)  <  e, 

XTie  rate  processes  61’*  and  62'*  are  piecewise  constant  on  [0,  r\,  with 
a  finite  number  of  intervals  of  constancy. 

5.  When  restricted  to  [0,  a),  the  rate  processes  are  pure  in  the  sense  that 
on  any  interval  of  constancy  (si,S2),  and  for  each  k  =  1,2,  there  is 
( i,j )  G  {0, 1, ...  ,1,  /+}  x  {0, 1, . . . ,  J,  J+}  such  that  9k’*  (t)  =  1  for  all 
t  G  (si,  S2).  In  addition,  for  k  =  1,2  and  any  interval  of  constancy  on 
which  9k,*(t)  =  1.  r*j(t)  >  baK . 

Proof:  Suppose  that  T,  91,  92,  and  x  satisfy  the  assumptions  of  the  lemma. 
Let  a  G  (0,  t\.  By  assumption,  we  have  >  be rK .  We  can  choose  a  >  0 

such  that  —a  log  (baK )  <  e/2,  and  such  that  if  Id  and  T2  are  any  occupancy 
processes  with  the  same  initial  condition,  then  || Ti (s)  —  T2 (s) 1 1  <  e  for  all 
s  G  [0,  <t]  (here  we  use  the  common  Lipschitz  continuity  for  all  occupancy 
functions).  A  time-reversed  induction  argument  will  be  used  to  construct 
the  pure  controls  on  [0,  a)  described  in  the  lemma.  The  main  idea  is  that 
the  occupancy  path  T*  will  proceed  on  [0,  a]  in  such  a  way  that  each  element 
increases  to  a  maximum  level  before  decreasing  to  its  final  value  r*(cr)  = 
r(<r).  Hence  the  contents  of  any  given  occupancy  class  are  only  reduced  at 
times  when  that  class  has  at  least  a  fraction  baK  of  the  urns.  In  other  words, 
0k’*  (s)  >  0  implies  r)v(s)  >  baK ,  allowing  the  cost  of  T*  on  (0,  a)  to  be 
made  arbitrarily  small. 

For  a  given  T  G  IA,  the  associated  minimum  number  of  balls  of  each  color 
per  urn  is  given  by  applying  the  linear  operators 

/  J+  J+ 

EE,r«+E(7+1)r'w 

*= 0  i= 0  j= 0  it 

j  1+  1+ 

EEjr«+E(J+1)r 

j= 0  i=0  i= 0 

Because  x  is  a  coloring  process  which  generates  r(cr)  it  must  satisfy  Xfc(<r)  > 
As  another  way  to  see  this  point,  one  may  verify  the  relations 

K(t'[6])  =  Elo Eg) bj  m2\e])  =  0 

(%(T1[0])  =  0  PI(T2[0])  =  ZI+oZU^r 


/5f(r)  = 
/5|(r)  = 
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The  definition  of  T  in  terms  of  x,  9  then  implies  that  0  <  /3?(r)  <  xk- 

To  simplify  the  exposition,  we  first  assume  Xfc(cr)  =  /3|(r(cr)),  and  sub¬ 
sequently  extend  the  argument  to  cover  inequality. 

For  colors  k  =  1,  2,  we  define  orderings  £&  :  {0, . . . ,  /+}  x  {0, . . . ,  J+}  — > 
N  by 

6  (i,j)  =  (J  +  1  )i  +  j,  =  i  + {I  +  1  )j, 

where  strictly  speaking  we  substitute  i  =  1+ 1  or  j  =  J+ 1  on  the  right-hand 
side  when  i  =  1+  or  j  =  J+  appears  on  the  left.  Observe  that  each  class  of 
urns  corresponds  to  a  distinct  value  of  £*,.  For  any  T  G  U,  let  U(T)  be  the 
set  of  (i,  j)  pairs  such  that  Fjj  >  0,  and  for  k  =  1,  2  define 

«fc(  F)  =  max^(?;,j) 

Ufc(r)  =  argmax£fc(z,  j). 


To  begin  the  induction,  we  set  r\  =  a  and  r*(ri)  =  r(<r),  noting  that 
Xfc(ri)  =  /3|(r*(ri)),  and  initialize  the  induction  variable  as  m  =  1. 

Denote  the  highest  non-empty  urn  class  under  the  color-A  ordering  by 
u™  =  Ufc(r*(rm)),  and  denote  the  corresponding  order  number  by  k™  = 
Now  imagine  in  reverse  time  pulling  balls  from  these  urns  so  that 
mass  drains  from  Fu™  at  a  rate  specified  by  X\  and  from  Tu™  at  a  rate 
specified  by  x2.  At  some  time  rm+ 1  <  rm,  one  of  the  two  urn  classes  will 
empty.  If  ur('‘  =  u™,  this  time  is  given  by 

Tm.+i  =  max  t  :  ^xfc(rm)  -  xk(t)  =  r*i(rm) 
l  k= 1 

and  otherwise  it  is  given  by 


rm+ 1  =  max{f  :  xi(rm)  -  xi{t)  =  V *m(rm)  or  x2(rm)  -  x2(i)  =  r*m(rm)}. 
Suppose  that  ti™  =  (i,  j)  for  some  i  >  0.  Then  (see  (4.2)), 
xi (rm)  =  ^i(r*(rm))  >  r*m(rm) 

so  that  there  must  be  a  solution  t  G  [0,  rm)  to  xi(f)  =  Xi(rm)  —  r*m(rm) 
Together  with  similar  analysis  of  it™,  this  means  that  a  solution  rm+i  G 
[0,  Tm)  always  exists  unless  u™  =  u™  =  (0, 0).  In  the  latter  case,  all  urns  are 
empty  at  time  rTO,  and  k™  +  /c™  =  0. 

Otherwise,  if  ft™  +  ft™  >  0,  we  define 

and  02,*(t)  =  e{um_(0)i)} 
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for  all  t  €  [Tm+i,Tm),  where  is  the  distribution  in  U  which  puts  all 
mass  on  the  i,j- th  element.  That  is,  during  the  m-th  interval,  in  forward 
time,  we  are  throwing  balls  into  urns  with  occupancies  U™  —  (1,0)  and 
u™  —  (0,1)  in  order  to  fill  the  urn  classes  u™.  The  process  T*  is  then 
defined  on  the  interval  [rm+i ,  Trn)  by  the  existing  terminal  condition  T*(rm) 
and  the  differential  equation  f*  =  Ylk  %kTk[0k’*]-  Because  the  6k’*  put  all 
their  mass  on  urn  classes  with  i  <  /,  j  <  J,  the  relations  (4.3)  establish  that 
x(rm_|_i)  =  /3|(T*(rm_(_i)),  setting  up  the  next  induction  step. 

Note  that  rm+i  was  chosen  so  that  at  least  one  of  the  classes  u U  is  empty 
m  T  - \ ) ‘  This  ensures  the  strict  inequality  )*  ^  n ^  Efc=i  s° 

that  the  induction  must  terminate  after  a  finite  number  of  steps  (say  M) 
with  k =  0,  meaning  that  T*(tm)  =  eo,o  =  T(0).  Moreover,  the  fact 
that  Xk{jM )  =  /3k(T* {tm))  =  0  shows  that  this  occurs  at  time  tm  =  0. 

This  establishes  the  existence  of  T*,6*  on  [0,  cr)  with  T*(<j)  =  T(cr)  and 
with  0*  consisting  of  pure,  piecewise  constant  controls.  In  addition,  T *j(t) 
increases  during  intervals  when  =  (i  —  l,j)  or  u™  =  ( i,j  —  1),  and  it 
decreases  when  u™  =  (i,j)  for  k  =  1,  2.  By  construction,  the  order  numbers 
kT"  increase  monotonically  with  decreasing  m,  ensuring  that  the  intervals 
on  which  T*  ■  increases  preceed  the  intervals  of  decrease. 

Finally,  we  extend  the  argument  to  the  case  when  Xk(cr)  >  f3^(T(<j))  for 
some  k  €  1,2.  Let  be  the  last  time  in  [0,  a]  such  that  Xk(sk)  =  /3f  (T(cr)), 
and  assume  without  loss  that  0  <  S2  <  si  <  cr.  By  virtue  of  (4.3),  the  event 
x\ (t)  >  /3f(T (t))  can  only  occur  if  balls  of  class  1  have  been  thrown  into 
(/+,  j)-occupied  urns,  which  can  only  happen  with  finite  cost  if  these  urns 
hold  some  mass:  E/=o^+j(0  >  0- 

On  the  interval  [si,cr),  we  may  define  01’*  =  eti+$)  and  02’*  =  e(0,j+)- 
Throwing  balls  into  such  urns  has  no  effect  on  the  occupancy  distribution, 
meaning  that  T*(t)  =  T(cr)  is  the  solution  on  this  interval  to  the  equation 

2 

r*  =  XkTk[0k ■*]  =  0. 

k=  1 

The  desired  property  that  9i,j(t)  >  0  implies  T* ^  (t)  >  8<jk  is  trivially  satis¬ 
fied  on  this  interval. 

At  time  si,  we  have  xi(si)  =  /?®(T*(si))  and  ^(si)  >  /3f(T*(si))-  On 
the  interval  [s2,  si),  we  use  a  modified  reverse-time  induction  in  which 

Tm+i  =  max{t  :  x1  (rm)  -  X\ (t)  =  T*m(rm)  or  t  =  s2}. 

On  the  m-th  subinterval,  the  occupancy  rates  naturally  are  defined  as  91’*  = 
eum_(lj0)  and  02’*  =  eo,j+.  Using  similar  arguments  to  before,  it  follows  that 
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after  a  finite  number  of  steps,  the  induction  terminates  with  tm  =  s2,  with 
■^o,  J +  (*2)  >  r0,  j_l_((j),  and  with  tt™  strictly  decreasing  in  m.  Since  now 
Xk(s2 )  =  /?fc(r*(s2)),  fc  =  1,  2,  the  original  induction  argument  may  be  used 
to  continue  the  definition  of  T*  on  [0,s2).  Now  both  k™  and  k™  are  both 
strictly  decreasing  in  m  across  the  entire  interval  [0,  cr),  and  we  have  the 
desired  property  that  Q%,j  >  0  implies  T*  ^(.s')  >  T^/ct)  for  s  G  [0,  cr] . 

This  completes  the  construction  of  the  processes  T*,  61’* ,  62’*  on  [0,  cr]. 
Note  that 

[±i R{Q1'*  ||r* )  +  x2R(e2’*  ||r* )]  ds 

<  -  [  \xi  log  ( 8(TK )  +  x2  log  (S&k )  ]  ds 
Jo 

=  —  crlog  (SaK) 

<  e/2, 


and  that  T*  deviates  no  more  than  £  from  T  on  [0,u],  while  ending  up  at 
the  same  place:  r*(cr)  =  r(cr). 

The  construction  on  [<r,  r]  is  simpler.  Let  M  G  N,  and  observe  that 
Tjj(s)  is  uniformly  bounded  away  from  zero  for  all  i,  j  and  s  G  [a,  t\.  We 
partition  [cr,  r]  into  M  subintervals  of  length  cm  =  (r  —  a)  /M.  On  each 
interval  we  set 


_ 


per  +(^+l)cjvf 


S  = 


'  a +lcM 


Xk{r)6iAr)dr  /  [xk(a  +  (Z  +  1)cm)  -  xk(a  +  lcM)\ 


if  cr  +  Icm  <  s  <  a  +  (l  +  1  )cm  (the  definition  is  unimportant  if  xk(cr  +  (l  + 
1  )cm)  —  xk(a  +  Icm)  =  0).  For  s  G  [cr,  r]  let 

t*  =  ±1 T1  [01’*]  +X2T2  [e2'*]  ,T*(<r)  =  T(cr). 


Since  Tjj(s)  is  uniformly  bounded  away  from  zero,  it  is  easy  to  check  that 
for  large  enough  M,  T*  is  a  valid  occupancy  path  that  is  associated  with 
the  processes  61'*,  62'*  and  x.  The  convergence  T*  — r  T  on  [cr,  r]  as  M  — *  00 
is  immediate,  and  it  follows  from  the  Lebesgue  Dominated  Convergence 
Theorem  that 


lim  /  [xi Rid1’*  ||T* )  +  x2R{62’*  ||T* )]  ds 


M^>  00 


[xi R{61  || T )  +  x2R{02  ||T)]  ds. 


Therefore  all  parts  of  the  lemma  hold  for  sufficiently  large  M.  □ 
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5  Proof  of  the  Large  Deviation  Lower  Bound 

Theorem  5.1  Under  Assumption  2^.1 

limsup  —  —  log.Eexp  [— n_F(Tn)]  <  inf  [I(T)  +  F(T)]  . 

n — >oo  Tl  TGo 


Proof:  Consider  any  T  for  which  /( r)  <  oo.  Then  it  suffices  to  show  that 
limsup  —  —  log-Eexp  [— nF(Tn)]  <  I(T)  +  F(T). 

n— >00  Tl 

Owing  to  Lemma  4.5  and  the  continuity  of  F,  we  can  assume  without  loss 
that  there  are  6  >  0  and  K  G  N  such  that  T %j(t)  >  6tK . 

We  again  utilize  the  representation  (2.3).  Fix  b  >  0.  According  to  the 
representation,  the  inequality  in  the  last  display  will  follow  if  we  can  find  a 
sequence  {pn,n  G  N}  such  that 

limsup  —R  (/in||IIn  <g)  An)  <  7(r)  +  6,  (5.1) 

n^>  oo  Ti 


and  such  that  if  Tn  is  the  urn  process  constructed  under  the  distribution 
pn,  then 

lim sup  Pn  {d(f n,  r)  >  b)  <  b.  (5.2) 

n — >oo 

To  prove  the  desired  bound  we  must  construct  an  appropriate  sequence  of 
measures  //n.  For  T  as  above,  let  01,02  and  x  denote  corresponding  rate 
and  coloration  processes  which  achieve  the  infimum.  Without  loss  we  can 
assume  these  processes  satisfy  the  properties  ascribed  to  9 lj*  and  62'*  in 
Lemma  4.7. 

For  a  >  0  let  G  :  T  — *  M  be  continuous  and  satisfy 

1/°  ifd(x,x)>2a 
'  \  0  if  d(x,  x)  <  a, 


and  also  G(x)  G  [0, 1/a]  for  all  x  G  T.  Since  {xn,n  G  N}  satisfies  a  large 
deviation  principle  with  rate  function  J,  there  is  a  sequence  { Xn,n  G  N}, 
satisfying 


En 


—R  (An||An)  +  G(xn) 
n 


— >  inf  [J(x)  +  G(®)] 

x£T 

<  J(x). 
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We  use  these  measures  in  constructing  yn  by  setting 

Ain  {dx dxNn ,  dyi , . . . ,  dyNn ) 

=  iii{dxi\y1,...,yNn)---tixn  (dxNn\x1,...,xNn-1,y1,...,yNn) 

A  n(dyi,...,dyNn). 

We  will  need  to  know  how  much  An  mass  is  placed  on  sequences  yi, ,  y\nT\+i 
such  that  if  xn  is  the  corresponding  cumulative  coloration  process,  then 

sup  d(xn(t.),x(t))  >  '2a. 
te[o,r] 


We  have 

<  lim  sup  En  [G(xn)] 

n— xx) 

<  inf  [J(x)  +  G( x)\ 

x&T 

<  J{x) 

<  oo. 

Thus  the  probability  Pn  |suptg[0iT]  d(xn(t),x(t))  >  2o|  can  be  made  as  small 
as  desired  for  large  n  by  taking  a  small.  Since  H-FHoo  <  oo,  when  construct¬ 
ing  the  controlled  urn  process  we  will  be  able  to  ignore  these  paths,  and  can 
let  the  measures  that  select  the  urns  be  the  original  uniform  measure  for 
such  points  in  the  underlying  probability  space.  The  relative  entropy  cost 
for  such  paths  is  then  zero.  Thus  in  the  rest  of  this  construction  we  focus 
on  the  case  where  the  underlying  coloration  process  satisfies 

sup  d(xn(t),x(t))  <  ‘2a. 
te[o,r] 

To  finish  the  construction  we  must  specify  the  conditional  distributions 
of  the  Xf.  Note  that  when  specifying  these  distributions  we  get  to  see  the 
complete  outcome  of  the  coloration  process,  and  can  assume 

that  this  process  satisfies  the  last  equation. 

The  construction  naturally  separates  according  to  the  partition  [0,  r]  = 
[0,  ct)U[ct,  t\.  We  must  specify  the  distribution  of  the  measures  /i”  (or  equiva¬ 
lently,  the  distribution  of  the  random  measures  JlV)  for  l  =  1, . . . ,  Nn.  Recall 
that  in  general  these  measures  are  allowed  to  depend  on  the  “past”  X”,  q  <  l 
However,  it  will  turn  out  that  we  can  assign  yf  based  on  just  the  coloration 
sequence  and  the  time  index  l  (i.e.,  “open  loop”  controls).  Let  be 


limsupipn<  sup  d(xn(t),  x(t))  >  2a 
n— >oo  a  lte[o,T] 


r 
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denote  the  finite  collection  of  intervals  on  which  9f3  (t)  is  constant,  so  that 
these  intervals  are  nonoverlapping,  and  [0,  r]  \  U^f=1  (s™,  s™)  consists  of  a 
finite  number  of  points.  Suppose  l/n  €  [s™,  s™). 

•  If  s™  <  a  then  for  each  k  there  is  (i.  j)  such  that  9fj(t)  =  1  for 
t  G  (s™,  -s™)-  If  Yj’1  =  k  (the  ball  at  time  l  is  color  k ),  then  /ip  set  to 
be  the  uniform  distribution  on  all  urns  of  class  (i,  j).  Note  that  when 
we  rewrite  the  relative  entropy  as  in  (3.2)  there  will  be  equality,  and 
in  fact 

En[R(tf  HO]  = 


En  [R{vn{l/n)\\tn(l/n))] 


E 


I-\~,  J + 

E 

i=0,j=0 


log 


r  Ijd/n) 


vZS/n) 


~En  [log  (f Zj(l/n))]  . 


•  If  .s™  >  a  then  the  controls  are  no  longer  “pure.”  If  yi  =  k  then 
each  9ij(t)  determines  a  “weight”  that  should  be  placed  on  urns  of 
class  (i.  j).  We  let  ^  be  the  measure  which  places  mass  0fj(t)  on  the 
urns  of  class  (i,  j),  and  within  this  class  uses  the  uniform  distribution 
to  apportion  mass.  We  again  have  equality  in  the  relative  entropy  in 
(3.2),  and  in  fact 


En[R(tf  HO]  = 


=  E 


En  [R{un{l/n)\\tn(l/u))] 

OUl/n) 


E 

i=o,j=o  L 


log 


r  Z^/n) 


0ij(l/n ) 


Note  that  for  the  controls  constructed  in  Lemma  4.7  we  have  r^-  (t)  > 
8(7K  for  any  (i.  j)  for  which  9} j(t)  V  9fj(t)  >  0.  Owing  to  the  random¬ 
ness  of  the  prelimit  processes,  we  cannot  guarantee  the  corresponding  result 
f fj(t)  >  8aK  for  any  ( i,j )  for  which  9\  -(t)  V  6'f  -(t)  >  0.  We  therefore  use 
a  stopping  time  argument  in  the  construction  of  the  measures  /ip.  Let  ln 
be  the  first  time  l  such  that  f™-(Z/n)  <  8aK / 2  for  some  (i,  j)  for  which 
9j j(l / n)  V 9f  j(l / n)  >  0.  From  time  ln  on  the  construction  above  is  modified, 
in  that  the  measure  is  selected  so  that  un{l/n)  =  Tn(l/n)  for  l  >ln.  Thus  a 
weight  of  f ™j(l/n)  is  placed  on  the  urns  of  class  (j.  j).  Note  that  with  this 
definition  En  [R  (vn {l / n)\\Tn {l / n))\  =  0  for  l  >ln.  We  now  apply  Theo¬ 
rem  3.1.  Thus  given  any  subsequence  we  have  convergence  along  a  further 
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subsequence  as  indicated  in  the  theorem,  with  limit  (f,  aq,  x2,  91, 02).  Using 
the  standard  argument  by  contradiction,  it  will  be  enough  to  prove  the  con¬ 
vergence  of  controlled  processes  and  bounds  on  the  relative  entropy  cost  for 
this  convergent  subsequence.  Let  7"  =  (ln/ri)  A  r.  Note  that  because  the 
applied  controls  are  pure,  the  process  Tn(t)  is  deterministic  prior  to  cr,  and 
also  that  prior  to  this  time,  the  time  derivative  of  both  Tn(t)  and  T(i)  are 
piecewise  constant.  In  fact,  the  two  derivatives  are  identical  except  possibly 
on  a  bounded  number  of  intervals  of  length  less  then  1/n  (located  near  the 
endpoints  of  the  intervals  of  constancy  of  T(t)).  Thus  for  large  n  we  cannot 
have  7n  <  a.  Since  the  range  of  7n  is  bounded  we  can  also  assume  7” 
converges  (along  the  same  subsequence)  in  distribution  to  a  limit  7,  and  it 
is  easy  to  check  that  the  limit  control  processes  a.e.  satisfy 


ift <7 

r ij(t)  if  t  >  7. 


Owing  to  the  definition  of  7n,  if  7  <  r  then  Tjj(7)  =  SaK /2  for  some  (i,  j). 
Recall  also  that  Tjj(t)  >  6ah  for  all  t  £  [cr,  r]. 

Observe  that  the  limit  processes  all  implicitly  depend  on  a  >  0  through 
the  function  G.  We  claim  that  for  each  b  >  0 


limP{h(r,r)  >  6}  =  0. 

We  already  know  that 

lim  P{d(x,x)  >  2a}  =  0 
<40 

However,  since  the  rate  processes  dfj  are  all  piecewise  constant,  the  integral 

(5h T1  [01]  +22T2  [02})ds 

converges  uniformly  to 

(£1 T1  [01]  +£2T2  [02})ds 

as  d(x,x )  0.  Therefore, 

limP<  sup  ||r(i)  —  T(t)||  >  b  >  =  0. 

“1°  I  0<t<7  I 

If  b  >  0  is  sufficiently  small,  then  following  three  items,  all  of  which  hold 
under  7  <  r,  form  a  contradiction: 
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•  I\j(7)  =  SaK  / 2  for  some 

•  r ij(t)  >  6ah  for  all  t  €  [0,  r] , 

•  sup0<t<7  \\T(t)  -  r(t)||  <  b. 

We  conclude  that  limajo  P  {7  <  r}  =  0,  and  therefore  lima|o  P  {d(f ,  T)  >  b}  = 
0  for  all  sufficiently  small  b  >  0.  It  follows  that  given  b  >  0,  for  some  fixed 
(sufficiently  small)  a  >  0  limsup^^^  P  {d(fn,  T)  >  b)  <  b. 

We  must  also  consider  the  relative  entropy  costs.  However,  again  using 
the  convergence  lima^o  P  {d(f ,  T)  >  6}  =  0  and  the  dominated  convergence 
theorem, 


lim  sup  lim  sup  Er 

aj.0  n—>oo 


1  Nn 


1=1 


=  lim  sup  lim  sup  En 

aj.0  n— >00 


M  /  Nn 

E  E  R  («'('/") l|f”(i/n))  M  (((  +  1)  /»)  -  ( l/n )) 

L  m=l  \  1=1 


'N„ 


+  (  (e\l/n)\\Yn{l/n))  (xn2  ((l  +  1)  jn)  -  xn2  (l/n)) 

M  rs2  s2 

E/  /  K(02(t)nr(t))dx2(t) 

m=l 

(^1(t)l|r(i))  +52(t)i?(02(#)||r(t))]  dt. 


=  lim  sup  E 
ajO 


When  combined  with  the  bound 


1 


limsup  —  R  (\n  ||An)  <  J(x), 


n 


for  small  enough  a  >  Owe  have  proved  (5.1).  Since  lima|Q  P  {d(r,  T)  >  6}  = 
0  implies  (5.2)  for  small  a  >  0,  the  proof  is  complete.  □ 
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